{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Python 机器学习实战 ——代码样例\n",
    "\n",
    "# 第六章 Collections 库"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.1.\tnamedtuple\n",
    "\n",
    "namedtuple 主要用来生成可以使用名称来访问元素的数据对象，通常用来增强代码的可读性，在访问一些 tuple 类型的数据时尤其好用。\n",
    "\n",
    "namedtuple 是一个函数，它用来创建一个自定义的 tuple 对象，并且规定了 tuple 元素的个数，可以用属性而不是索引来写入或者访问 tuple 的某个元素。\n",
    "\n",
    "这样一来，我们用 namedtuple 可以很方便地定义一种数据类型，比如 XY 坐标，它具备 tuple 的内容不变性，又可以根据属性来引用，使用十分方便。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 2\n"
     ]
    }
   ],
   "source": [
    ">>> from collections import namedtuple\n",
    "\n",
    ">>> Point = namedtuple('Point', ['x', 'y'])\n",
    ">>> p = Point(1, 2)\n",
    "\n",
    ">>> print(p.x, p.y)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "web(name='google', type='search', url='www.google.com')\n",
      "google www.google.com\n",
      "www.google.com www.sina.com.cn\n"
     ]
    }
   ],
   "source": [
    ">>> Web = namedtuple('web', ['name', 'type', 'url'])\n",
    ">>> p1 = Web('google', 'search', 'www.google.com')\n",
    ">>> p2 = Web('sina', 'portal', 'www.sina.com.cn')\n",
    "\n",
    ">>> print(p1)\n",
    ">>> print(p1.name, p1.url)\n",
    ">>> print(p1.url, p2.url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sina\n",
      "portal\n",
      "www.sina.com.cn\n"
     ]
    }
   ],
   "source": [
    ">>> for i in p2:\n",
    "...    print(i)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.2. deque\n",
    "\n",
    "使用 list 存储数据时，按索引访问元素很快，但是插入和删除元素就很慢了，因为 list 是线性存储，数据量大的时候，插入和删除效率很低。\n",
    "\n",
    "deque 是为了实现高效插入和删除操作的双向列表，适合用于队列和栈。\n",
    "\n",
    "deque 在插入数据时候速度比 list 快很多，当然这个是相对存在大量数据的 list 而言的。如果你的程序需要对有百万级别数据量的 list 频繁的在各个位置插入删除数据，那么用 deque 是值得的。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "deque(['y', 'a', 'b', 'c', 'x'])\n"
     ]
    }
   ],
   "source": [
    ">>> from collections import deque\n",
    ">>> import time\n",
    "\n",
    ">>> q = deque(['a', 'b', 'c'])\n",
    ">>> q.append('x')\n",
    ">>> q.appendleft('y')\n",
    "\n",
    ">>> print(q)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对比 deque与 list 的速度，对含有 1 亿个元素的 list 执行插入，查看函数执行时间："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.12650108337402344\n"
     ]
    }
   ],
   "source": [
    ">>> q0 = [x*x for x in range(100000000)]\n",
    ">>> a = time.time()\n",
    ">>> q0.insert(0,888)\n",
    ">>> b = time.time()\n",
    ">>> print(b-a)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对含有 1 亿个元素的 deque执行插入，查看函数执行时间："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0\n"
     ]
    }
   ],
   "source": [
    ">>> q1= deque(q0)\n",
    ">>> a = time.time()\n",
    ">>> q1.appendleft(888)\n",
    ">>> b = time.time()\n",
    ">>> print(b-a)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " deque翻转操作："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "deque(['d', 'e', 'a', 'b', 'c'])\n",
      "deque(['a', 'b', 'c', 'd', 'e'])\n"
     ]
    }
   ],
   "source": [
    ">>> l = ['a','b','c','d','e']\n",
    ">>> l = deque(l)\n",
    ">>> l.rotate(2)\n",
    ">>> print(l)\n",
    ">>> l.rotate(-2)\n",
    ">>> print(l)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "deque使用 pop() 函数同样可以区分头尾，如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "deque(['a', 'b'])\n",
      "deque(['b', 'c'])\n"
     ]
    }
   ],
   "source": [
    ">>> l = deque(['a','b','c'])\n",
    ">>> l.pop()\n",
    ">>> print(l)\n",
    "\n",
    "\n",
    ">>> l = deque(['a','b','c'])\n",
    ">>> l.popleft()\n",
    ">>> print(l)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.3.\tdefaultdict\n",
    "\n",
    "我们都知道，在使用 Python 原生的数据结构 dict 的时候，如果用 d[key] 这样的方式访问，当指定的 key 不存在时，会抛出 KeyError 异常，也就是发生错误 ( 当然可以用 get 方法来避免这样的错误 )。 \n",
    "\n",
    "如果使用 defaultdict，只要你传入一个默认的方法，那么请求一个不存在的 key 时， 便会调用这个方法，使用其结果来作为这个 key 的默认值。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "David\n"
     ]
    }
   ],
   "source": [
    ">>> from collections import defaultdict\n",
    ">>> i = defaultdict(lambda: 100)\n",
    ">>> i['name']='David'\n",
    ">>> print(i['name']) \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "default 返回默认值，不会报错。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100\n"
     ]
    }
   ],
   "source": [
    ">>> print(i['score'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.4.\tOrderedDict\n",
    "\n",
    "使用 dict 字典时，Key 是无序的。在对 dict 做迭代访问时，我们无法确定 Key 的顺序。\n",
    "\n",
    "如果要保持 Key 的顺序，可以用 OrderedDict，这是一个 Key 值有序的字典数据类型。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "传统 dict 是无序的："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'a': 1, 'c': 3, 'b': 2}\n"
     ]
    }
   ],
   "source": [
    ">>> d = dict([('a', 1), ('b', 2), ('c', 3)])\n",
    ">>> print(d)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "追加一对 key value：\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'a': 1, 'c': 3, 'd': 4, 'b': 2}\n"
     ]
    }
   ],
   "source": [
    ">>> d['d'] = 4\n",
    ">>> print(d)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 OrderedDict："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OrderedDict([('a', 1), ('b', 2), ('c', 3)])\n"
     ]
    }
   ],
   "source": [
    ">>> from collections import OrderedDict\n",
    "\n",
    ">>> d = OrderedDict()\n",
    ">>> d['a'] = 1\n",
    ">>> d['b'] = 2\n",
    ">>> d['c'] = 3\n",
    "\n",
    ">>> print(d)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 OrderedDict, 追加一对 key value。OrderedDict 的 Key 会按照插入的顺序排列，不是 Key 本身排序。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])\n"
     ]
    }
   ],
   "source": [
    ">>> d['d'] = 4\n",
    ">>> print(d)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.5.\tCounter\n",
    "\n",
    "Counter提供了一个简单的计数器功能。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Please input:abcddee\n"
     ]
    }
   ],
   "source": [
    ">>> from collections import Counter\n",
    ">>> s = input('Please input:')\n",
    ">>> s = s.lower()\n",
    ">>> c = Counter(s)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "获取出现频率最高的 5 个字符："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('e', 2), ('d', 2), ('a', 1), ('c', 1), ('b', 1)]\n"
     ]
    }
   ],
   "source": [
    ">>> print(c.most_common(5)) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计数值的访问与缺失的键："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "1\n",
      "0\n"
     ]
    }
   ],
   "source": [
    ">>> print(c['a'])\n",
    ">>> print(c['b'])\n",
    ">>> print(c['1'])\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
